Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Allen, Genevra (Ed.)Throughout the last decade, random forests have established themselves as among the most accurate and popular supervised learning methods. While their black-box nature has made their mathematical analysis difficult, recent work has established important statistical properties like consistency and asymptotic normality by considering subsampling in lieu of bootstrapping. Though such results open the door to traditional inference procedures, all formal methods suggested thus far place severe restrictions on the testing framework and their computational overhead often precludes their practical scientific use. Here we propose a hypothesis test to formally assess feature significance, which uses permutation tests to circumvent computationally infeasible estimates of nuisance parameters. This test is intended to be analogous to the F-test for linear regression. We establish asymptotic validity of the test via exchangeability arguments and show that the test maintains high power with orders of magnitude fewer computations. Importantly, the procedure scales easily to big data settings where large training and testing sets may be employed, conducting statistically valid inference without the need to construct additional models. Simulations and applications to ecological data, where random forests have recently shown promise, are provided.more » « less
-
Summary Bird species’ migratory patterns have typically been studied through individual observations and historical records. In recent years, the eBird citizen science project, which solicits observations from thousands of bird watchers around the world, has opened the door for a data-driven approach to understanding the large-scale geographical movements. Here, we focus on the North American tree swallow (Tachycineta bicolor) occurrence patterns throughout the eastern USA. Migratory departure dates for this species are widely believed by both ornithologists and casual observers to vary substantially across years, but the reasons for this are largely unknown. In this work, we present evidence that maximum daily temperature is predictive of tree swallow occurrence. Because it is generally understood that species occurrence is a function of many complex, high order interactions between ecological covariates, we utilize the flexible modelling approach that is offered by random forests. Making use of recent asymptotic results, we provide formal hypothesis tests for predictive significance of various covariates and also develop and implement a permutation-based approach for formally assessing interannual variations by treating the prediction surfaces that are generated by random forests as functional data. Each of these tests suggest that maximum daily temperature is important in predicting migration patterns.more » « less
-
Abstract Erosion of landscapes underlaid by permafrost can transform sediment and nutrient fluxes, surface and subsurface hydrology, soil properties, and rates of permafrost thaw, thus changing ecosystems and carbon emissions in high latitude regions with potential implications for global climate. However, future rates of erosion and sediment transport are difficult to predict as they depend on complex interactions between climatic and environmental parameters such as temperature, precipitation, permafrost, vegetation, wildfires, and hydrology. Thus, despite the potential influence of erosion on the future of the Arctic and global systems, the relations between erosion‐rate and these parameters, as well as their relative importance, remain largely unquantified. Here we quantify these relations based on a sedimentary record from Burial Lake, Alaska, one of the richest datasets of Arctic lake deposits. We apply a set of bi‐ and multi‐variate techniques to explore the association between the flux of terrigenous sediments into the lake (a proxy for erosion‐rate) and a variety of biogeochemical sedimentary proxies for paleoclimatic and environmental conditions over the past 25 cal ka BP. Our results show that erosion‐rate is most strongly associated with temperature and vegetation proxies, and that erosion‐rate decreases with increased temperature, pollen‐counts, and abundance of pollen from shrubs and trees. Other proxies, such as those associated with fire frequency, aeolian dust supply, mass wasting and hydrologic conditions, play a secondary role. The marginal effects of the sedimentary‐proxies on erosion‐rate are often threshold dependent, highlighting the potential for strong non‐linear changes in erosion in response to future changes in Arctic conditions.more » « less
An official website of the United States government

Full Text Available